systematic framework
Navigating the Pitfalls of Active Learning Evaluation: A Systematic Framework for Meaningful Performance Assessment
Active Learning (AL) aims to reduce the labeling burden by interactively selecting the most informative samples from a pool of unlabeled data. While there has been extensive research on improving AL query methods in recent years, some studies have questioned the effectiveness of AL compared to emerging paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL), or a simple optimization of classifier configurations. Thus, today's AL literature presents an inconsistent and contradictory landscape, leaving practitioners uncertain about whether and how to use AL in their tasks. In this work, we make the case that this inconsistency arises from a lack of systematic and realistic evaluation of AL methods. Specifically, we identify five key pitfalls in the current literature that reflect the delicate considerations required for AL evaluation. Further, we present an evaluation framework that overcomes these pitfalls and thus enables meaningful statements about the performance of AL methods. To demonstrate the relevance of our protocol, we present a large-scale empirical study and benchmark for image classification spanning various data sets, query methods, AL settings, and training paradigms. Our findings clarify the inconsistent picture in the literature and enable us to give hands-on recommendations for practitioners.
A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems
Mishra, Pranav Pushkar, Yeole, Kranti Prakash, Keshavamurthy, Ramyashree, Surana, Mokshit Bharat, Sarayloo, Fatemeh
In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This research presents a systematic framework for metadata enrichment using large language models (LLMs) to enhance document retrieval in Retrieval-Augmented Generation (RAG) systems. Our approach employs a comprehensive, structured pipeline that dynamically generates meaningful metadata for document segments, substantially improving their semantic representations and retrieval accuracy. Through extensive experiments, we compare three chunking strategies-semantic, recursive, and naive-and evaluate their effectiveness when combined with advanced embedding techniques. The results demonstrate that metadata-enriched approaches consistently outperform content-only baselines, with recursive chunking paired with TF-IDF weighted embeddings yielding an 82.5% precision rate compared to 73.3% for semantic content-only approaches. The naive chunking strategy with prefix-fusion achieved the highest Hit Rate@10 of 0.925. Our evaluation employs cross-encoder reranking for ground truth generation, enabling rigorous assessment via Hit Rate and Metadata Consistency metrics. These findings confirm that metadata enrichment enhances vector clustering quality while reducing retrieval latency, making it a key optimization for RAG systems across knowledge domains. This work offers practical insights for deploying high-performance, scalable document retrieval solutions in enterprise settings, demonstrating that metadata enrichment is a powerful approach for enhancing RAG effectiveness.
- North America > United States > Arizona (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design
Malikussaid, null, Nuha, Hilal Hudan, Kurniawan, Isman
Large Language Models demonstrate substantial promise for advancing scientific discovery, yet their deployment in disciplines demanding factual precision and specialized domain constraints presents significant challenges. Within molecular design for pharmaceutical development, these models can propose innovative molecular modifications but frequently generate chemically infeasible structures. We introduce VALID-Mol, a comprehensive framework that integrates chemical validation with LLM-driven molecular design, achieving an improvement in valid chemical structure generation from 3% to 83%. Our methodology synthesizes systematic prompt optimization, automated chemical verification, and domain-adapted fine-tuning to ensure dependable generation of synthesizable molecules with enhanced properties. Our contribution extends beyond implementation details to provide a transferable methodology for scientifically-constrained LLM applications with measurable reliability enhancements. Computational analyses indicate our framework generates promising synthesis candidates with up to 17-fold predicted improvements in target binding affinity while preserving synthetic feasibility.
- Asia > Indonesia > Java > West Java > Bandung (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Hudson County > Jersey City (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Weinheim (0.04)
EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models
Guo, Ping, Zhu, Chenyu, Chen, Siyuan, Liu, Fei, Lin, Xi, Lu, Zhichao, Zhang, Qingfu
CUDA kernel optimization has become a critical bottleneck for AI performance, as deep learning training and inference efficiency directly depends on highly optimized GPU kernels. Despite the promise of Large Language Models (LLMs) for automating kernel optimization, this field suffers from a fragmented ecosystem of isolated and incomparable approaches with unclear problem formulations. Furthermore, general-purpose LLM code evolution methods cannot meet strict correctness requirements of CUDA kernel optimization. We address these fundamental challenges by first formalizing CUDA kernel optimization as a code optimization task with a clear objective, constraints, and evaluation metrics. We then establish the first systematic LLM-based code evolution framework, EvoEngineer, that provides guidance for designing and adapting optimization strategies to achieve a balance between performance and correctness. Finally, we implement a kernel optimization system based on this framework and conduct extensive experiments on 91 real-world CUDA kernels. Our results demonstrate that EvoEngineer achieves a principled balance between performance and correctness, with the highest averaged median speedup of 2.72 over baseline CUDA kernels and a code validity rate of 69.8%, outperforming existing methods on both dimensions. Our method achieves a maximum speedup of 36.75 among all operations over PyTorch kernels and delivers the highest speedup on 28 (56.0%) of 50 operations that achieve over 2 acceleration. CUDA kernel performance has become the critical bottleneck constraining the efficiency of AI training and inference. As foundation models continue scaling to unprecedented sizes (Guo et al., 2025; Jaech et al., 2024), computational demands necessitate maximum GPU utilization efficiency, where even marginal improvements in kernel performance can yield substantial reductions in computational costs. However, manual kernel optimization requires deep expertise across GPU architectures, memory hierarchies, parallelization patterns, and hardware-specific features (Navarro et al., 2020; Hennessy & Patterson, 2011), constituting a major obstacle to scaling AI systems. The kernel code optimization landscape presents extreme complexity, involving intricate tradeoffs between memory coalescing, thread divergence, occupancy optimization, and register usage (Ujald on, 2016; Huang et al., 2021; Zhao et al., 2022).
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
PairBench: A Systematic Framework for Selecting Reliable Judge VLMs
Feizi, Aarash, Rajeswar, Sai, Romero-Soriano, Adriana, Rabbany, Reihaneh, Gella, Spandana, Zantedeschi, Valentina, Monteiro, João
As large vision language models (VLMs) are increasingly used as automated evaluators, understanding their ability to effectively compare data pairs as instructed in the prompt becomes essential. To address this, we present PairBench, a low-cost framework that systematically evaluates VLMs as customizable similarity tools across various modalities and scenarios. Through PairBench, we introduce four metrics that represent key desiderata of similarity scores: alignment with human annotations, consistency for data pairs irrespective of their order, smoothness of similarity distributions, and controllability through prompting. Our analysis demonstrates that no model, whether closed- or open-source, is superior on all metrics; the optimal choice depends on an auto evaluator's desired behavior (e.g., a smooth vs. a sharp judge), highlighting risks of widespread adoption of VLMs as evaluators without thorough assessment. For instance, the majority of VLMs struggle with maintaining symmetric similarity scores regardless of order. Additionally, our results show that the performance of VLMs on the metrics in PairBench closely correlates with popular benchmarks, showcasing its predictive power in ranking models.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Navigating the Pitfalls of Active Learning Evaluation: A Systematic Framework for Meaningful Performance Assessment
Active Learning (AL) aims to reduce the labeling burden by interactively selecting the most informative samples from a pool of unlabeled data. While there has been extensive research on improving AL query methods in recent years, some studies have questioned the effectiveness of AL compared to emerging paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL), or a simple optimization of classifier configurations. Thus, today's AL literature presents an inconsistent and contradictory landscape, leaving practitioners uncertain about whether and how to use AL in their tasks. In this work, we make the case that this inconsistency arises from a lack of systematic and realistic evaluation of AL methods. Specifically, we identify five key pitfalls in the current literature that reflect the delicate considerations required for AL evaluation.
A Productive, Systematic Framework for the Representation of Visual Structure
We describe a unified framework for the understanding of struc(cid:173) ture representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common "middle-scale" parts, repre(cid:173) sented as image fragments. The model addresses the same concerns as previous work on compositional representation through the use of what where receptive fields and attentional gain modulation. It does not require prior exposure to the individual parts, and avoids the need for abstract symbolic binding. The focus of theoretical discussion in visual object processing has recently started to shift from problems of recognition and categorization to the representation of object structure.
A Productive, Systematic Framework for the Representation of Visual Structure
Edelman, Shimon, Intrator, Nathan
For example, priming in a subliminal perception task was found to be confined to a quadrant of the visual field [16]. The notion that the representation of an object may be tied to a particular location in the visual field where it is first observed is compatible with the concept of object file, a hypothetical record created by the visual system for every encountered object, which persists as long as the object is observed. Moreover, location (as it figures in the CoF model) should be interpreted relative to the focus of attention, rather than retinotopically [17]. The idea that global relationships (hence, large-scale structure) have precedence over local ones [18], which is central to our approach, has withstood extensive testing in the past two decades. Even with the perceptual salience of the global and local structure equated, subjects are able to process the relations among elements before the elements themselves are identified [19]. More generally, humans are limited in their ability to represent spatial structure, in that the representation of spatial relations requires spatial attention.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Texas > Taylor County (0.04)
- (2 more...)
A Productive, Systematic Framework for the Representation of Visual Structure
Edelman, Shimon, Intrator, Nathan
For example, priming in a subliminal perception task was found to be confined to a quadrant of the visual field [16]. The notion that the representation of an object may be tied to a particular location in the visual field where it is first observed is compatible with the concept of object file, a hypothetical record created by the visual system for every encountered object, which persists as long as the object is observed. Moreover, location (as it figures in the CoF model) should be interpreted relative to the focus of attention, rather than retinotopically [17]. The idea that global relationships (hence, large-scale structure) have precedence over local ones [18], which is central to our approach, has withstood extensive testing in the past two decades. Even with the perceptual salience of the global and local structure equated, subjects are able to process the relations among elements before the elements themselves are identified [19]. More generally, humans are limited in their ability to represent spatial structure, in that the representation of spatial relations requires spatial attention.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Texas > Taylor County (0.04)
- (2 more...)
A Productive, Systematic Framework for the Representation of Visual Structure
Edelman, Shimon, Intrator, Nathan
For example, priming in a subliminal perception task was found to be confined to a quadrant of the visual field [16]. The notion that the representation of an object may be tied to a particular location in the visual field where it is first observed is compatible with the concept of object file, a hypothetical record created by the visual system for every encountered object, which persists as long as the object is observed. Moreover, location (as it figures in the CoF model) should be interpreted relative to the focus of attention, rather than retinotopically [17]. The idea that global relationships (hence, large-scale structure) have precedence over local ones [18], which is central to our approach, has withstood extensive testing in the past two decades. Even with the perceptual salience of the global and local structure equated, subjects are able to process the relations among elements before the elements themselves are identified [19]. More generally, humans are limited in their ability to represent spatial structure, in that the representation of spatial relations requires spatial attention. For example, visual search is difficult when above below 0. 9
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Texas > Taylor County (0.04)
- (2 more...)